AITopics | formal proof

Collaborating Authors

formal proof

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Neural Information Processing SystemsJun-11-2026, 18:29:44 GMT

Formal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. A formal verification system can check whether a formal proof is correct or not almost instantaneously, but generating a completely correct formal proof with large language models (LLMs) remains a formidable task. The usual approach in the literature is to prompt the LLM many times (up to several thousands) until one of the generated proofs passes the verification system.

large language model, logic & formal reasoning, natural language, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.60)

Add feedback

Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing

Moore, Hayden, Shah, Asfahan

arXiv.org Artificial IntelligenceDec-5-2025

Large Language Models (LLMs) have recently emerged as powerful tools for autoformalization. Despite their impressive performance, these models can still struggle to produce grounded and verifiable formalizations. Recent work in text-to-SQL, has revealed that LLMs can be sensitive to paraphrased natural language (NL) inputs, even when high degrees of semantic fidelity are preserved (Safarzadeh, Oroo-jlooyjadid, and Roth 2025). In this paper, we investigate this claim in the autoformalization domain. Specifically, we evaluate the robustness of LLMs generating formal proofs with semantically similar paraphrased NL statements by measuring semantic and compilation validity. Using the formal benchmarks MiniF2F (Zheng, Han, and Polu 2021) and Lean 4 version of ProofNet (Xin et al. 2024), and two modern LLMs, we generate paraphrased natural language statements and cross-evaluate these statements across both models. The results of this paper reveal performance variability across paraphrased inputs, demonstrating that minor shifts in NL statements can significantly impact model outputs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.12784

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

Shen, Ziju, Huang, Naohao, Yang, Fanyi, Wang, Yutong, Gao, Guoxiong, Xu, Tianyi, Jiang, Jiedong, He, Wanyi, Yang, Pu, Sun, Mengzhou, Ju, Haocheng, Wu, Peihao, Dai, Bryan, Dong, Bin

arXiv.org Artificial IntelligenceNov-25-2025

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.20613

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Chase, Zachary, Ito, Shinji, Mehalel, Idan

arXiv.org Machine LearningNov-4-2025

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θ\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2511.00257

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)

Add feedback

A formal proof of an

Neural Information Processing SystemsOct-2-2025, 19:13:42 GMT

First, we would like to thank the reviewers for their helpful feedback. It is difficult/impossible to generate a worst case environment model in many scenarios. MDP, as opposed to the entirety of the agent's dynamics. Writing such worst-case constraints by hand is often feasible. How efficient is our algorithm?

artificial intelligence, machine learning, shield, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure

Hattori, Seiji, Matsuzaki, Takuya, Fujiwara, Makoto

arXiv.org Artificial IntelligenceSep-15-2025

This paper proposes a natural language translation method for machine-verifiable formal proofs that leverages the informalization (verbalization of formal language proof steps) and summarization capabilities of LLMs. For evaluation, it was applied to formal proof data created in accordance with natural language proofs taken from an undergraduate-level textbook, and the quality of the generated natural language proofs was analyzed in comparison with the original natural language proofs. Furthermore, we will demonstrate that this method can output highly readable and accurate natural language proofs by applying it to existing formal proof library of the Lean proof assistant.

formal proof, large language model, logic & formal reasoning, (20 more...)

arXiv.org Artificial Intelligence

2509.09726

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

a7968b4339a1b85b7dbdb362dc44f9c4-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 15:36:51 GMT

algorithm, dataset, iteration, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.31)

Add feedback

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Ren, Z. Z., Shao, Zhihong, Song, Junxiao, Xin, Huajian, Wang, Haocheng, Zhao, Wanjia, Zhang, Liyue, Fu, Zhe, Zhu, Qihao, Yang, Dejian, Wu, Z. F., Gou, Zhibin, Ma, Shirong, Tang, Hongxuan, Liu, Yuxuan, Gao, Wenjun, Guo, Daya, Ruan, Chong

arXiv.org Artificial IntelligenceJul-21-2025

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

large language model, linarith, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2504.21801

Genre:

Research Report (1.00)
Instructional Material (0.87)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs

Yousefzadeh, Roozbeh, Cao, Xuenan

arXiv.org Artificial IntelligenceJul-8-2025

This position paper provides a critical but constructive discussion of current practices in benchmarking and evaluative practices in the field of formal reasoning and automated theorem proving. We take the position that open code, open data, and benchmarks that are complete and error-free will accelerate progress in this field. We identify practices that create barriers to contributing to this field and suggest ways to remove them. We also discuss some of the practices that might produce misleading evaluative information. We aim to create discussions that bring together people from various groups contributing to automated theorem proving, autoformalization, and informal reasoning.

large language model, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.04719

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

Cao, Jialun, Lu, Yaojie, Li, Meiziniu, Ma, Haoyang, Li, Haokun, He, Mengda, Wen, Cheng, Sun, Le, Zhang, Hongyu, Qin, Shengchao, Cheung, Shing-Chi, Tian, Cong

arXiv.org Artificial IntelligenceFeb-17-2025

The research in AI-based formal mathematical reasoning has shown an unstop- pable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.

large language model, machine learning, specification, (18 more...)

arXiv.org Artificial Intelligence

2501.16207

Country:

Asia (0.92)
North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback